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Claims 



Claim 1, the characteristics of the RDO based rate control scheme include: 

Step 1 : Does bit allocation for every picture in a GOP, and based on the allocated bits a 
predicted quantization parameter is used to do rate distortion optimization mode selection for 
every macroblock in the current picture; 

Step 2: The information collected from the fust rate distortion mode selection is used to 
calculate a final quantization parameter for rate control, and if the final quantization parameter 
is different from the predicted, a second rate distortion mode selection will be executed again. 
Claim 2, as claim 1 has said, in step 1, before coding a GOP, does bit allocation for the 
pictures in the GOP with the average picture size; 

Claim 3, as claim 2 has said, the average picture size is calculated as: 
R/F = R^F, here, R is the target bit rate. F is the picture rate. R/F is the average picture size. 
Claim 4, as claim 1 and claim 2 have said, does bit allocation adjustment in the coded GOP. 
The adjustment is implemented as follows: 
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here, T/, Tp and 7; is the bits allocated to the I, P or B frame respectively. Nu Np and is the 
remained none coded I, P or B frames m the GOP respectively. X^ X^ and is die global complexity 
estimation for the I, P or B frame respectively and is defined as the multiplier between coded bits 
and average quantization parameter for the frame. 



bit_rate is the target bit rate, picture j-ate is the frame rate. 

Kp and Ki, are constants. Kp, Kb means the complexity ratio between P, B frame and I frame 
respectively. 

R is the remained bits for the GOP, and after coding a picture it is updated as follows: 

Si^p^b is the coded bits for the current frame. 
Claim 5, as claim 4 has said, before coding a GOP, the remaining bits for the current GOP is 
initialized as follows: 

R^G + Rpr^ 

G = bitj'ate x N picture_rate 
here, R is the remained bits for the current GOP. 
N is the number of frames in the current GOP. 
G is the number of bits for a GOP. 

Rprev is the remained bits for the previous GOP. For the furst GOP, Rp^v^O, 
Claim 6, as claim 4 has said, Xp and Xb are initialized as: 
Xi=2i^bit_rate 
Xp^hy^bit_rate 
Xb^'C^bitj-ate 

here a, b and c are constants. 

bit_rate is the target bitrate. 

Claim 7, as claim 1 has said, the step 1 also includes at least one time rate distortion 
optimization based mode selection with a predicted quantization parameter. The predicted 
quantization parameter may be the quantization parameter of the previous macroblock or decided by 
rate distortion model in a rate control scheme. The mode minimizing the following expression is 
selected as the initial coding mode for the current macroblock: 

D{s,cMODE I QP) + A^oDE^is,c,MODE \ QP) 

here, s is the luma value of the original macroblock. c is the luma value of the reconstructed 
macroblock. Xmqde is the lagrangian constant. 

For I/P frame, A^^^. = 0.85 x 2 ; 



For B frame, ^j^qde 0.85 x 2 . 

D(s,cMODE\QP) is used to evaluate the distortion of the current macroblock after it is 
coded with mode MODE. 

R(s,cMODE\QP) is the bits used to code the macroblock with mode MODE, 
QP is the quantization parameter for the current macroblock. 

Claim 8, as claim 7 has said, for motion estimation in P or B frame, the motion vector 
minimizing following expression is selected as the motion vector for the current macroblock: 

J (m, ^MOTION ) = SA{T)D{s, c(m)) + X^qj^on ^(m - p) 
here, D(s,c(m)) is used to evaluate the distortion from motion conqjensation. 
SA (T) D is the sum of the absolute difference after prediction (or after Hadmard transform) for the 
macroblock. 

R(m-p) is the bits used to code the motion vector. 

s is the luma value of the current macroblock in the original frame. 

c is the luma value in reference picture. 

/w is the motion vector. 

p is the predicted motion vector. 



^MOTiON^^ the lagrangian constant and ^I^/otyoa^ = ^^mode - 

^MODE is the lagrangian constant. 

Claim 9, as claim 2 has said, after the first rate distortion mode selection, the RDO based rate 
control further includes: calculating quantization parameter for the current macroblock. The 
quantization parameter is adjusted according to the macroblock activity and buffer status. 

Claim 10, as claim 9 has said, the quantization parameter for the macroblock is adjusted 
according to the macroblock activity. After the first rate distortion mode selection, the sum of the 
absolute difference is used as the macroblock activity estimation. The macroblock activity is 
calculated as: 

{2xact,)+avg act 

act = l\siijyc{ij)\ N^act„ =^—^7 =^ 

^ ^ J act J + (2 X avg _ act ) 



here, / is the horizontal position of the pixel in the current macroblock. j is the vertical 
position of the pixel in the cunent macroblock. N_act„ is the activity of the current macroblock. 
s{ij) is the luma value of the original pixel(/,7), c(ij) is the prediction value of pixel(z j). avgjict is 
the average actm in the previous coded picture which is coded with the same type as current picture. 
actm is the sum of the absolute difference after motion compensation or intra prediction. 

Claim II, as claim 9 has said, a virtual buffer is used to do rate control. First set up the 
mapping from the virtual buffer occupancy to macroblock quantization parameter, and the final 
macroblock quantization parameter is calculated as: 
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r = 2xbit_ rate I picture _ rate 
here, Q^^ is the quantization parameter of current macroblock. 

dm is the current buffer occupancy, and it equals dJi4m , and dj' for I, P, B frame 
respectively. 

Bm-i is the bits used to code previous macroblock. 

do" is the initial buffer occupancy for ciurent frame, n is i,p or b, corresponding to do,do^,dnd 

do . 

r is the size of virtual buffer. 

Claim 12, as Claim 11 said, when coding the first frame, the virtual buffer occupancy is 
initialized with: 

rfj=10xr/31 

here r is the virtual buffer size; dd4Q^\ and do is the initial virtual buffer occupancy for i, p, 
or b frame. Kp is the con^lexity ratio between I, P frame; Ki, is the complexity ratio between I, B 
frame. 

Claim 13, as claim 2, 9, 10, 11 and 12 have said, the RDO based rate control also includes a 



second RDO mode selection, after calculating the final quantization parameter for the current 
macroblock. That is to say, the selected quantization parameter for the current macroblock will be 
used to do RDO mode selection again. The mode which minimizes the following expression will be 
selected as the coding mode for the current macroblock: 

D(s,c,MODE I QP) + X^odeR(s,c,MODE \ QP) 

here, s is the luma value of the original macroblock. c is the luma value of the reconstructed 
macroblock. Xmode is the lagrangian constant. 

For I/P frame, A^^^^ = 0.85 x 2 ; 

<?».-/ 

ForB frame, A^^^^ =4x0.85x2 . 

D(s,cMODE\QP) is used to evaluate the distortion of the current macroblock coded with 
mode MODE. 

R(s,cMODE\QP) is the bits used to code the macroblock with mode MODE. 
QP is the quantization parameter for the current macroblock. 

Claim 14, as claim 13 has said, for motion estimation in P or B frame, the motion vector 
minimizing following expression is selected for the current macroblock; 

J ^MOTION ) = SA(T)Dis,c{m)) + ^motion ^(^ " P) 
here, D(s,c(m)) is used to evaluate the distortion from motion condensation. 
SA (T) D is the sum of the absolute difference (or after Hadmard transform) for the macroblock. 
R(m-p) is the bits used to code the motion vector. 
s is the luma value of the current macroblock in the original frame. 
c is the luma value in reference picture. 
m is the motion vector, 
p is the predicted motion vector. 

^MOTION the lagrangian constant and ^motion ^ -yJ^MODE • 
^MODE is the lagrangian constant. 

Claim 15, a rate distortion optimization based rate control implementation includes 
following modules: a video coding encoder module (for example, H.264 encoder module or JVT 
processing module), rate distortion optimization mode selection and adaptive quantization module, 



virtual buffer, and global con^lexity estimation module; here, JVT processing module receives the 
input frame, and it is connected with RDO mode selection module, virtual buffer module and global 
complexity estimation module. 

RDO mode selection module and adaptive quantization is connected with virtual buffer and 
global conq^lexity estimation module. It receives the input signal from JVT processing module, and 
processes it based on the virtual buffer module and global con^lexity module status, and then 
calculate the quantization parameter for the macroblock. In the last, JVT processing module will 
output the final coded macroblock with the calculated parameter. 

Claim 16, as claim 15 has said, before coding a GOP, does bit allocation for the pictures in 
the GOP with the average picture size; 

Claim 17, as claim 16 said, the average picture size is calculated as: 

R/F = R^F, here, R is the target bit rate, F is the picture rate. R/F is the average picture size. 

Claim 18, as claim 16 and 17 have said, does bit allocation adjustment in the GOP. The 
adjustment is shown as follows: 
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here, Tj, Tp and is the bits allocated tho the I, P or B frame respectively. Ni^ Np and Niy is the 
remained none coded I, P or B frames in the GOP respectively. A), Xp and Xt is the global complexity 
estimation for the I, P or B frame respectively and is defined as the multipUer between the coded bits 
and average quantization parameter for the frame. 

bitj-ate is the target bit rate. picture_rate is the frame rate. 

Kp and Kb are constants. Ap, means the complexity ratio between P,B frame and I frame 



respectively. 

R is the remained bits for the GOP, and after coding a picture it is updated as follows: 

^ ~ R'^i.p.b 

Si^p^b is the coded bits for the current frame. 
Claim 19, as Claim 18 has said, before coding a GOP, the remaining bits for the current GOP is 
initialized as follows: 

R = G^Rprev 

G = bit_rate ^ N-^ picture_rate 
here, R is the remained bits for the current GOP. 

is the number of frames in current GOP. 
G is the number of bits for a GOP. 

Rprev is the remained bits for the previous GOP, For the fu-st GOP, Rprev=0. 
Claim 20, as claim 18 said, Aj ^ Xp and Xb are initialized as: 

Xi=^^bit_rate 

Xp=h^bitj-ate 

Xb=^cy^bit_rate 
here a, b and c are constants. 
bit_rate is the target bitrate. 

Claim 21, as claim 15 said, does the mode selection while using the quantization parameter 
of previous macroblock as a prediction value for the current macroblock. The mode minimizing the 
following expression is selected as the initial coding mode for the cunent macroblock: 

D(s,c,MODE I QP) + A^oo^R{s,c,MODE \ QP) 

here, s is the luma value of the original macroblock. c is the luma value of the reconstructed 
macroblock. Xmode^^ the lagrangian constant. 

For I/P frame, /l^^^^ = 0.85 x 2 ; 

For B frame, A^^^^ = 4x0.85x2 

D(s,cMODE\QP) is used to evaluate the distortion of the current macroblock coded with 
mode MODE, 

R(s,cMODE\QP) is the bits used to code the macroblock with mode MODE. 



QP is the quantization parameter for the current macroblock. 
Claim 22, as claim 2 1 has said, for motion estimation in P or B frame, the motion vector minimizing 
following expression is selected for the current macroblock: 

J {J^y ^MOTION ) = SA{T)D{s,c{m)) + ^motion ^i^'V) 
here, D(s,c(m)) is used to evaluate the distortion from motion compensation. 
SA (T) D is sum of the absolute difference (or after Hadmard transform) for the macroblock. 
Rfm-p) is the bits used to code the motion vector. 
s is the luma value of the current macroblock in the original frame. 
c is the luma value in reference picture. 
m is the motion vector. 
p is the predicted motion vector. 



^MOTION the lagrangian constant and ^motion ~ -sI^mode • 
^MODE is the lagrangian constant. 

Claim 23, as claim 22 has said, after the first rate distortion mode selection, the rate control 
scheme further includes: Calculating a new quantization parameter and adjusting it according to the 
macroblock activity and buffer status. 

Claim 24, as claim 22 said, for adjusting quantization parameter for the current macroblock. 
the sum of the absolute difference is used as the macroblock activity estimation after first rate 
distortion mode selection. The macroblock activity is calculated as: 

(2 X act J ) + avg _ act 

^\hJ)-c\i,J)\ iv_aci„ = 



act ^ J^s{iJ)-c{iJ)\ N_act^ =■ , ^ 
^ I y act J + (2 X avg _ act ) 



here, 1 is the horizontal position of the pixel in the current macroblock. j is the vertical 
position of the pixel in the current macroblock. Njictm is the activity of the current macroblock. 
s{ij) is the luma value of the original pixel(/,y), c{ij) is the prediction value of pixel(/,7). avgjact is 
the average actm in the previous coded picture which is coded with the same type as current picture. 
actm is the sum of the absolute difference after motion compensation or intra prediction. 

Claim 25, as claim 22 has said, a virtual buffer is used to do rate control. First set up the mapping 
from the virtual buffer occupancy to macroblock quantization parameter. The macroblock 



quantization parameter is calculated as: 
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here, Qm is the quantization parameter of current macroblock, 
rfffl" is the current buffer occupancy, and it equals dj,d/ , and dj" 
for I, P, B frame respectively. 
B„,\ is the bits used to code previous macroblock. 

do" is the initial buffer occupancy for current frame, n is i,p or b, corresponding to rfo'i^o^and 



r is the size of virtual buffer occupancy. 

Claim 26, as claim 25 said, when coding the first frame, the virtual buffer occupancy is 
initialized with: 



here r is the virtual buffer size; do\do^\ and do is the initial virtual buffer occupancy for i, p, 
or b frame. AT^is the con^lexity ratio between I, P frame; Kt is the con:q)lexity ratio between I,B 
frame. 

Claim 27, as claim 23, 24, 25, 26 have said, the RDO based rate control also includes a 
second RDO mode selection, after quantization parameter decision for the current macroblock. That 
is to say, the decided quantization parameter for the current macroblock will be used to do RDO 
mode selection again. The mode which minimizes the following expression will be selected as 
coding mode for the current macroblock: 



here, s is the luma value of the original macroblock. c is the luma value of the reconstructed 
macroblock. Xmode^^ the lagrangian constant. 
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D{s,c,MODE I QP) + A^oDE^{s,c,MODE \ QP) 
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For I/P frame, >1^^^^ = 0.85 x 2 
For B frame, =4x0.85x2 . 

D(s,cMODE\QP) is used to evaluate the distortion of the current macroblock after it is 

coded. 

R(sxMODE\QP) is the bits used to code the macroblock with mode MODE. 

QP is the quantization parameter for current macroblock. 
Claim 28, as claim 27 said, for motion estimation m P or B frame, the motion vectors minimizes 
following expression are selected as the motion vectors for the current macroblock: 

J (m, X^oTioN ) = SA{T)D{s, c(m)) + A^or/oiv ^(^ - p) 
here, D(s,c(m)) is used to evaluate the distortion from motion con:q)ensation. 
5.4 (T) D is sum of the absolute difference (or after Hadmard transform) for the macroblock. 
R(m'p) is the bits used to code the motion vector. 
s is the luma value of the current macroblock in the original frame. 
c is the luma value in reference picture. 
m is the motion vector. 
p is the predicted motion vector. 

^MOTION the lagrangian constant and ^motion - 4^ mode • 
^MODE is the lagrangian constant. 

Claim 29, as Claim 28 said, quantization parameter from RDO and adaptive quantization module is 
sent back to JVT processing module, the macroblock is coded by JVT processing module and 
output. 



Description 



A Method and Apparatus for Rate Distortion Optimization Based Rate Control 
BACKGROUND OF THE INVENTION: 



Advanced video coding techniques are important for multimedia storage and transmission. For 
this reason, many video coding standards have been standardized. H.264 is the latest video coding 
standard. H.264/AVC standard jointly developed by ISO and ITU-T— Joint Video Team (JVT), 
also known as MPEG-4 Part 10 and H.264 in the H.26x serial standards, has substantially 
outperformed the previous video coding standards by utilizing a variety of temporal and spatial 
predictions. Rate control is an important technique although it does not belong to the normative part 
in video coding standards. However, without rate control any video coding scheme would be 
practically useless in many applications because the client buffer may often imder-flow and 
over-flow when a channel used to deliver the compressed stream is of constant bandwidth. 
Therefore, every video coding standard has its own rate control technique, for example, TM5 for 
MPEG-2 and TMN8 for H.263. 
RDO is one of important video coding techniques. It is used to select optimal motion vectors an 
optimal coding mode for every macroblock. Yet the RDO used in H.264 test model makes it difficuh 
to adopt the existing rate control techniques. Because rate control usually requires a pre-determined 
set of motion vectors and coding modes to select the quantization parameter, whereas RDO requires 
a pre-determined quantization parameter to select motion vectors and coding modes. On the other 
hand, as the complexity ratio between coded frame, the bit allocation model and adaptive 
quantization scheme should also be improved. The invention is a method and apparatus for rate 
distortion optimization based rate control. The invention can be used for video streaming, 
transmission, and storage coding. 

SUMMARY OF THE INVENTION: 

The invention is to provide a method and apparatus of rate control for a video encoder, in 
which rate distortion optimization technique is used to improve coding efficiency. 

As shown in Figure 2, a rate distortion optimization based rate control in^lementation 
includes following modules: JVT processing module, rate distortion optimization based macroblock 
mode selection module, virtual buffer, and global complexity estimation module. 

JVT processing module receives the input frame data, and it is connected with RDO mode 
selection module, virtual buffer module and global complexity estimation module; 

RDO mode selection module is connected witfi virtual huffier and global complexity 
estimation module. It receives the input signal from JVT processing module, and processes it based 
on the virtual buffer module and global complexity module status. In the last, the output signal is 
sent back to JVT processing module, JVT module will output the final coded macroblock. 

Before coding a GOP, does bit allocation for the pictures in the GOP with the average picture 
size; The average picture size is calculated as: 

R/F = R^Fy here, R is the target bit rate. F is the picture rate. R/F is the average picture size. 

The bit allocation adjustment in the coded GOP is shown as follows: 
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here, 7), 7), and 7i is the bits allocated to the I, P or B frame respectively. A/;> Np and Nb is the 
remained none coded I, P or B frames in the GOP respectively. Xi^X^ mdXb is the global complexity 
estimation for the I, P or B frame respectively and is defined as the multiplier between coded bits 
and average quantization parameter for the frame. 

bitj-ate is the target bit rate, picture j'ate is the frame rate, 

Kp and Kb are constants. Kp, Kb means the complexity ration between P, B frame and I frame 
respectively 

R is the remained bits for the GOP, and after coding a picture is updated as follows: 



Si,p^b is the coded bits for the current frame. 
Before coding a GOP, the remaining bits for the current GOP is initialized as follows: 



here, R is the remained bits for the current GOP. 

is the number of frames in current GOP. 
G is the number of bits for a GOR 

Rprev is the remained bits for the previous GOP. For the first GOP, /Jp^v^O. 
Xi, Xp and Xb are initialized as: 
Xi=^xbit_rate 
Xp=h>^bit_rate 



R - R-Si,p,b 



R = G + R, 



G = bit_rate ^ N-^ picture_rate 



XirQ^bitj'ate 
here a, b and c are constants, 

bit_rate is the target bitrate. 

Does the mode selection while using the quantization parameter of previous macroblock as a 
prediction value for the current macroblock. The mode minimizes the following expression is 
selected as the initial coding mode for the current macroblock: 

D{s,cMODE I QP) + Aj^o^^R{s,c,MODE \ QP) 

here, s is the luma value of the original macroblock. c is the luma value of the reconstructed 
macroblock. Xmode is the lagrangian constant. 

For I/P frame, X^^^^^ = 0.85 x 2 ; 

For B frame, A^^^^ = 4 x 0.85 x 2 . 

D(s,cMOD£\QP) is used to evaluate the distortion of the cunent macroblock after it is 

coded. 

R(s,c,MODE\QP) is the bits used to code the macroblock with mode MODE. 

QP is the quantization parameter for current macroblock. 
for motion estimation in P or B frame, the motion vectors minimizes following expression are 
selected as the motion vectors for the current macroblock: 

J i^^^ MOTION ) = SA(T)D{s,c(m)) + X^otjon ' P) 
here, D(s,c(m)) is used to evaluate the distortion from motion conqiensation. 
SA (T) D is sum of the absolute difference (or after Hadmard transform) for the macroblock. 
R(m-p) is the bits used to code the motion vector. 
s is the luma value of the current macroblock in the original frame, 
c is the luma value in reference picture. 
m is the motion vector. 
p is the predicted motion vector. 



^MODE • 



^MOTION the lagrangian constant and Xj^otion = 

"^MODE is the lagrangian constant. 

After the first rate distortion mode selection, the output of RDO mode selection module is 



sent to JVT processing module. A new quantization parameter will be calculated by the JVT 
processing module. The quantization parameter is adjusted according to macroblock activity. 

After first rate distortion mode selection, the sum of the absolute difference is used as the 
macroblock activity estimation. The macroblock activity is calculated as: 

{2xactj)+avg_act 



act = I,\s{iJ)-c{iJ)\ N_act„ = , ~v 

^ • . actj-^-ylxavg^act) 



here, / is the horizontal position of the pixel in the current macroblock.^ is the vertical position of the 
pixel in the current macroblock. Nj2Ct„ is the activity of the current macroblock, s{ij) is the luma 
value of the original pixel(/,y), c{ij) is the prediction value of pixel(/,7). avg_act is the average actm 
in the previous coded picture which is coded with the same type as current picture, actm is the sum of 
the absolute difference after motion condensation or intra prediction. 

When coding the first frame, the virtual buffer occupancy is initialized with: 

^/^=10xr/31 

here r is the virtual buffer size; dd, do"', and is the initial virtual buffer occupancy for i, p, 
or b frame. Kp is the complexity ration between I, P frame; is the conqjlexity ratio between 13 
frame. 

The RDO based rate control also includes a second RDO mode selection, after quantization 
parameter decision for the current macroblock. That is to say, the decided quantization parameter 
for the current macroblock will be used to RDO mode selection again. The mode which minimizes 
the following expression will be selected as coding mode for the current macroblock: 

here, s is the luma value of the original macroblock. c is the luma value of the reconstructed 
macroblock. Xmode is the lagrangian constant. 

For I/P frame, X^^^^ = 0.85 x 2 ; 

ForB frame, /l^oD£ =4x0,85x2 

D(s,cMODE\QP) is used to evaluate the distortion of the current macroblock after it is 



coded. 

R(s,c,MODE\QP) is the bits used to code the macroblock with mode MODE, 
QP is the quantization parameter for current macroblock. 

Quantization parameter from JVT processing module is sent back to JVT processing module, the 

macroblock is coded by JVT processing module and output. 

Based on above modules, the drawbacks of traditional rate control schemes are removed. As 
RDO and rate control are considered together, the RDO based video coding can reach accurate 
target bitrate control while with good performance. 
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Figure 1 is an apparatus for the invention. 



# 



10/521877 



RDO Mode 

Selection 
I'w^'^'^d Adaptive 
Quantization 



Input 
Frane 



Intra 
Prediction 



al 



Entropy 
Coding 



Transfora 
Quantization 



Motion 
Cofflpensation 



a2 



Motion 
Estination 



Inverse Quantization 
- Inverae Transfora 





Frame 
Buffer 










Loopf liter j 


4 


< 





Virtual 




Buffer _ 








Global 


Complexity 


Estiaation 



Figure 2 is an implementation of the invention on the JVT encoder 
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